Introduction to Ambisonics

Fundamentals

Chris Hold, Virtual Acoustics (Aalto University), Spring 2021

Structure

From Recording to Playback

Audio Description Formats

Channel Based Object Based Scene Based
Channel Object Scene

Benefits of Scene-based

Ambisonics

Example

A B A+B
$0.5$alt1 $0.5$alt2 $=$alt3

Example

A B A+B
$0.5$alt1 $0.5$alt2 $=$alt3

Example

A B A+B
$0.25$alt1 $0.75$alt2 $=$alt3

Example

A B A+B
$0.25$alt1 $0.75 e^{-j\pi/4}$alt2 $=$alt3

From the Wave Equation to Spherical Harmonics

Wave equation \begin{equation}\label{eq:waveeq} \nabla^2 p - \frac{1}{c^2} \frac{\partial^2 p}{\partial t^2} = 0 \quad. \end{equation}

In frequency domain, with wave-number $k = \frac{\omega}{c}$ results in the Helmholtz equation \begin{equation}\label{eq:helmholtz} (\nabla^2 + k^2) p = 0 \quad. \end{equation}

Multiple solutions, e.g. a mono chromatic plane wave with amplitude $\hat{A}(\omega)$ in Cartesian coordinates \begin{equation} p(t, x, y, z) = \hat{A}(\omega) e^{i(k_x x + k_y y + k_z z - \omega t)} \quad. \end{equation}

sfspw

Any solution to the Helmholtz equation can also be expressed in spherical coordinates $$ p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} \color{darkgreen}{(A_{mn} j_n(kr) + B_{mn} y_n(kr))} \; \color{darkblue}{Y_{n}^{m}(\theta,\phi)} \quad, $$

$$ p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} \color{darkgreen}{C_{mn}(kr)} \; \color{darkblue}{Y_{n}^{m}(\theta,\phi)} \quad, $$

With two separable parts:

sound field defined on a sphere is fully captured by its spherical harmonics coefficients

E.g. a unit plane wave \begin{equation} p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} 4 \pi i^n j_n(kr) \left[ Y_n^m(\theta_k,\phi_k) \right] ^* Y_{n}^ {m}(\theta,\phi) \quad. \end{equation}

Discovering Spherical Harmonics

$$ Y_n^m(\theta,\phi)=\color{darkorange}{\sqrt{\frac{2n+1}{4\pi}\frac{(n-m)!}{(n+m)!}}}\, \color{darkgreen}{P_n^m(\cos(\theta))}\, \color{darkred}{e^{im\phi}} \quad, $$$$ Y_n^m(\theta,\phi)=\color{darkorange}{D_{nm}}\, \color{darkgreen}{P_n^m(\cos(\theta))}\, \color{darkred}{e^{im\phi}} \quad, $$

where

Combined with appropriate scaling give real Spherical Harmonics $Y_{n,m}(\theta,\phi)$ as

$$ Y_{n,m}(\theta,\phi) = \sqrt{ \frac{(2n+1)}{4\pi} \frac{(n-|m|)!}{(n+|m|)!} } P_{n,|m|}(\cos(\theta)) \begin{cases} \sqrt2\sin(|m|\phi) & \mathrm{if\hspace{0.5em}} m < 0 \quad,\\ 1 & \mathrm{if\hspace{0.5em}} m = 0 \quad,\\ \sqrt2\cos(|m|\phi) & \mathrm{if\hspace{0.5em}} m > 0 \quad. \end{cases} $$
realSH

Let's look at the azimuthal component $e^{im\phi}$ and the zenithal component $P_n^m(\cos\theta)$

azimuthal component

zenithal component

Orthonormality

Two functions $\color{darkviolet}f, \color{darkred}g$ over a domain $\gamma$ are orthogonal if

$$ \int_\gamma \color{darkviolet}{f^*(\gamma)} \color{darkred}{g(\gamma)} \,\mathrm{d}\gamma = \langle \color{darkviolet}f, \color{darkred}g \rangle = 0 \quad,\,\mathrm{for}~f \neq g. $$

They are also orthonormal if $$ \color{darkviolet}{\int_\gamma f^*(\gamma) f(\gamma) \,\mathrm{d}\gamma = \int_\gamma|f(\gamma)|^2 \,\mathrm{d}\gamma = \langle f, f \rangle } = 1 \quad, \\ \color{darkred}{\int_\gamma g^*(\gamma) g(\gamma) \,\mathrm{d}\gamma = \int_\gamma |g(\gamma)|^2 \,\mathrm{d}\gamma = \langle g, g \rangle} = 1 \quad.$$

For the Spherical Harmonics:

Their product is still orthogonal, and the scaling $\color{darkorange}{D_{nm}}$ ensures orthonormality such that $$ \int_\Omega Y_n^m(\Omega)^* \, Y_{n'}^{m'}(\Omega) \,\mathrm{d}\Omega = \langle Y_n^m(\Omega) , Y_{n'}^{m'}(\Omega) \rangle = \delta_{nn'}\delta_{mm'} \quad , $$

and

$$ \int_{{\Omega} \in \mathbb{S}^2} |Y_n^m({\Omega})|^2 \mathrm{d}{\Omega} = 1 \quad . $$

Example

show normality for $Y_0^0$: $$ Y_0^0(\theta,\phi)=\sqrt{\frac{0+1}{4\pi}\frac{(0)!}{(0)!}} P_0^0(\cos(\theta)) e^{i0\phi} = \sqrt{\frac{1}{4\pi}} \quad, $$ hence $$ \int_{{\Omega} \in \mathbb{S}^2} Y_0^0(\theta,\phi)^* Y_0^0(\theta,\phi) \,\mathrm{d}{\Omega} = \int_{{\Omega} \in \mathbb{S}^2} \sqrt{\frac{1}{4\pi}}\sqrt{\frac{1}{4\pi}} \, \mathrm{d}{\Omega} = 4\pi \frac{1}{4\pi} = 1 \quad. $$

Example

Test azimuthal components $f_{azi} = e^{i 2 \phi}, \;g_{azi} = e^{i3 \phi}$,

and zenithal components $f_{zen} = P_1^1(\cos\theta), \;g_{zen} = P_2^1(\cos\theta)$ .

Spherical Harmonic Transform (SHT)

This can be expressed with $\Omega = [\phi, \theta]$ as the inverse Spherical Harmonic Transform (iSHT) $$ s({\Omega}) = \sum_{n = 0}^{N=\infty} \sum_{m=-n}^{+n} \sigma_{nm} Y_n^m({\Omega}) \quad. $$

Spherical harmonics coefficients $\sigma_{nm}$ can be derived with the Spherical Harmonic Transform (SHT) $$ \sigma_{nm} = \int_{{\Omega} \in \mathbb{S}^2} s({\Omega}) [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} = \langle [Y_n^m({\Omega})] , s({\Omega}) \rangle \quad. $$

Spherical Grids

The SHT evaluates the continuous integral over $\Omega$ $$ \sigma_{nm} = \int_{{\Omega} \in \mathbb{S}^2} s({\Omega}) [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} \quad . $$ Quadrature methods allow evaluation by spherical sampling at certain (weighted) grid points such that \begin{equation} \sigma_{nm} \approx \sum_{q=1}^{Q} w_q s({\Omega}_q) [Y_n^m({\Omega}_q)]^* \quad. \end{equation}

Certain grids with sampling points $ {\Omega}_q $ and associated sampling weights $w_q$ have certain properties:

Spatial Dirac

We can show that spherical harmonics are orthogonal (even orthonormal) with $$ \int_\Omega Y_n^m(\Omega) \, Y_{n'}^{m'}(\Omega) \,\mathrm{d}\Omega = \delta_{nn'}\delta_{mm'} \quad . $$

Because of their completeness, we can also directly formulate a spatial Dirac function on the sphere as $$ \sum_{n=0}^{N=\infty} \sum_{m=-n}^n [Y_n^m({\Omega'})]^* Y_n^m(\Omega) = \delta(\Omega - \Omega') \quad, $$

and therefore the spherical Fourier coefficients $\sigma_{nm}$ $$ SHT\{\delta(\Omega - \Omega')\} = \int_{{\Omega} \in \mathbb{S}^2} \delta(\Omega - \Omega') \, [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} = [Y_n^m({\Omega'})]^* \quad . $$

Order-Limitation of Spatial Dirac Pulse

Example

Integrate (order-limited) Spatial Dirac $\delta_N(\Omega - \Omega')$ over sphere

$$ \int_{{\Omega} \in \mathbb{S}^2} \delta_N(\Omega - \Omega') \mathrm{d}{\Omega} = \int_{{\Omega} \in \mathbb{S}^2} \color{darkblue}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega)} \,\mathrm{d}{\Omega} \quad, $$

by discretization with sufficient t-design $$ \int_{{\Omega} \in \mathbb{S}^2} \color{darkblue}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega)} \,\mathrm{d}{\Omega} = \frac{4\pi}{Q}\sum_{q=1}^{Q} \color{darkorange}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega_q)} $$

Matrix Notations

Stack the spherical harmonics evaluated at $\Omega$ up to spherical order $N$ as

$$ \mathbf{Y} = \left[ \begin{array}{ccccc} Y_0^0(\Omega[0]) & Y_1^{-1}(\Omega[0]) & Y_1^0(\Omega[0]) & \dots & Y_N^N(\Omega[0]) \\ Y_0^0(\Omega[1]) & Y_1^{-1}(\Omega[1]) & Y_1^0(\Omega[1]) & \dots & Y_N^N(\Omega[1]) \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ Y_0^0(\Omega[Q-1]) & Y_1^{-1}(\Omega[Q-1]) & Y_1^0(\Omega[Q-1]) & \dots & Y_N^N(\Omega[Q-1]) \end{array} \right] $$

such that $ \mathbf{Y} : [Q, (N+1)^2]$

Obtaining the discrete signals $s_q(t)$ is a linear combination of SH basis functions evaluated at $\Omega_q$.

This inverse transform in matrix notation with ambisonic signals matrix $\mathbf{\chi} : [1, (N+1)^2]$ is

$$ s_q(t) = \mathbf{\chi}(t) \, \mathbf{Y}^T \quad . $$

We obtain ambisonic signals matrix $\chi : [1, (N+1)^2]$ from signals $\mathbf{S} : [1, Q]$ by SHT as

$$ \mathbf{\chi}(t) = \mathbf{S}(t) \, \mathrm{diag}(w_q) \, \mathbf{Y} \quad. $$

From the Ambisonics Encoder, to Spatial Weighting, to a Decoder

Encoder

A single plane wave encoded in direction $\Omega$ with signal $\mathbf{s}$ is directly the outer product with the spatial Dirac coefficients

$$ \mathbf{\chi}_{PW(\Omega)} = \mathbf{s} \, \mathbf{Y}(\Omega) \quad. $$

For multiple sources $Q$, we stack and sum $$ \mathbf{\chi}_{PW(\Omega_Q)} = \sum_{q=1}^Q\mathbf{s}_q \, \mathbf{Y}(\Omega_q) = \mathbf{S} \, \mathbf{Y} \quad. $$

Spatial Weighting

The simplest beamformer is a spatial Dirac in direction $\Omega_k$ normalized by its energy, i.e. $\mathrm{max}DI$ $$ w_{nm, \mathrm{max}DI}(\Omega_k) = \frac{4\pi}{(N+1)^2} Y_{n,m}(\Omega_k) $$

Other patterns can be achieved by weighting the spherical Fourier spectrum. Axis-symmetric patterns reduce to only a modal weighting $d_n$, such that $$ w_{nm}(\Omega_k) = d_{n} \, Y_{n,m}(\Omega_k) $$

E.g. $\mathrm{max}\vec{r}_E$ weights each order with $$ d_{n,\,\mathrm{max}\vec{r}_E} = P_n[\cos(\frac{137.9^\circ}{N+1.51})] \quad, $$ with the Legendre polynomials $P_n$ of order $n$.

Decoder

$$ s({\Omega_k}) = \sum_{n = 0}^{N} \sum_{m=-n}^{+n} w_{nm}({\Omega_k}) \, \sigma_{nm} \quad, $$

or in matrix notation with $\mathbf{S} : [t, Q]$ and beamforming weights $\mathbf{d}_n$ $$ \mathbf{S} = \mathbf{\chi} \, \mathrm{diag_N}(\mathbf{d}_n) \, \mathbf{Y}^T \quad . $$

Example

Decode on a t-design(6) (sufficient up to $N = 3$):

Example

Decode on a t-design(6) (sufficient up to $N = 3$):

Loudspeaker Decoders

Binaural Decoding

$$ s^{l,r}(t) = x(t) * h_{\mathrm{HRIR}}^{l,r}({\Omega}, t) \quad, $$

where $(*)$ denotes the time-domain convolution operation.

Transforming to the time-frequency domain through the time-domain Fourier transform, further assuming plane-wave components $\bar X (\Omega)$, the ear input signals are given as $$ S^{l,r}(\omega) = \int_{\Omega} \bar X (\Omega, \omega) H_{nm}^{l,r}(\Omega, \omega) \,\mathrm{d}\Omega \quad. $$

$$ S^{l,r}(\omega) = \sum_{n = 0}^{N} \sum_{m = -n}^{+n} \breve X_{nm}(\omega) \breve H_{nm}^{l,r}(\omega) \quad. $$

For one ear (left) this can be interpreted as a frequency dependent ambisonic beamformer $$ s^l(\omega) = \chi_{nm}(\omega) [\breve H_{nm}^{l}(\omega)]^T \quad . $$